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Abstract 

We describe a dynamic programming algo- 
rithm for computing the marginal distri- 
bution of discrete probabilistic programs. 
This algorithm takes a functional inter- 
preter for an arbitrary probabilistic program- 
ming language and turns it into an effi- 
cient marginalizer. Because direct caching 
of sub-distributions is impossible in the pres- 
ence of recursion, we build a graph of de- 
pendencies between sub-distributions. This 
factored sum-product network makes (poten- 
tially cyclic) dependencies between subprob- 
lems explicit, and corresponds to a system of 
equations for the marginal distribution. We 
solve these equations by fixed-point iteration 
in topological order. We illustrate this algo- 
rithm on examples used in teaching proba- 
bilistic models, computational cognitive sci- 
ence research, and game theory. 



1 INTRODUCTION 

Probabilistic programming allows rapid prototyping 
of complexly structured probabilistic models without 
requiring the design of model-specific inference algo- 
rithms. This makes probabilistic programs attractive 
for scientific research: when hypotheses are formalized 
as programs, it is possible to quickly explore the space 
of hypotheses. The same features make probabilistic 
programs compelling for education: students can fo- 
cus on understanding modeling and inference patterns 
before they need to learn about inference implementa- 
tions. 

However, the performance of current inference algo- 
rithms for generic probabilistic programs can vary 
greatly between models, even for models with a very 
small number of random choices. This presents an ob- 
stacle to the use of probabilistic programs in research 



and teaching. In fact, many of the models used in these 
domains are small enough that exact computation is 
feasible in principle, but they often exhibit patterns, 
such as nested conditioning, that make naive enumer- 
ation intractable. 

In this paper we develop a generic dynamic program- 
ming algorithm, which expands the applicability of ex- 
act inference for probabilistic programs. Given an in- 
terpreter for an arbitrary probabilistic programming 
language and a discrete probabilistic program, this 
algorithm computes the marginal distribution of the 
program — i.e., its distribution on return values — while 
sharing subcomputations where possible. By viewing 
conditioning as marginalization of a rejection sampler, 
this captures the full range of probabilistic operations 
over arbitrary models. 

The key obstacle to dynamic programming, which is 
neither present in caching deterministic interpreters 
nor in dynamic programming algorithms for more re- 
stricted model classes, is the possibility of stochastic 
self-recursion: an interpreter call with particular ar- 
guments can result in a call with the same arguments. 
Figure [T^, shows a program that exhibits this prop- 
erty. This is not a corner case: for instance, all mod- 
els that implement conditioning via rejection sampling 
have this property (Figure [TJd) . 

To make dynamic programming possible in the pres- 
ence of recursion, we first compile the given probabilis- 
tic program to an intermediate representation that rei- 
fies dependencies between sub-distributions. We then 
compute the marginal distribution from this represen- 
tation. Our intermediate representation is a general- 
ization of sum-product networks ( Poon and Domingos| 
|2011[ ) that makes dependencies — including recursive 
dependencies — explicit: a factored sum-product net- 
work (FSPN). While computing the distribution im- 
plied by a sum-product network is linear in the size of 
the network, FSPNs are more difficult to solve in gen- 
eral. We solve FSPNs by clustering their vertices into 
strongly connected components and by solving each 



(define (game player) 
(if (flip .6) 

(not (game (not player))) 
(if player 

(flip .2) 
(flip .7)))) 



(game true) 



(a) A simple game 



(define (rejection joint condition?) 
(let ([sample (joint)]) 
(if (condition? sample) 
sample 

(rejection joint condition?)))) 
(b) Conditioning 
Figure 1: Recursive probabilistic programs 

component using fixed-point iteration. 

In the following, we first describe the structures our al- 
gorithm operates on: probabilistic programs, their in- 
terpreters, and FSPNs. We then present the two steps 
of our algorithm, compilation of programs to FSPNs 
and computation of marginal distributions given a 
FSPN. We demonstrate the algorithm on examples 
used in teaching, cognitive science research, and game 
theory, and explain what makes it attractive in each 
case. We relate the algorithm to the literature and 
conclude with future research directions. 

2 PROBABILISTIC PROGRAMS 

A probabilistic program is a program in a language 
with primitives for sampling from distributions such 
as Bernoulli and multinomial. Probabilistic programs 
describe generative models and thus denote distribu- 
tions. An interpreter specifies this denotation by im- 
plementing a process that, given a program, generates 
samples from the program's distribution. For exam- 



ple, an interpreter for the Church language (Goodman 



et al. 2008 ) takes a program expression and environ- 



ment, and returns a sample from the program's dis- 
tribution on Church values. This sample is generated 
using recursive calls to the interpreter, with each sub- 
call defining a distribution on values and resulting in 
a sample from this sub-distribution. 

The problem of inference for generative models is com- 
monly formulated in terms of a conditioning. However, 
for any conditional distribution there is an equivalent 
unconditioned model that samples outcomes with the 
same probabilities. Inference can be understood as the 
problem of marginalization of this new model. 

A simple way to construct a generative model which 



samples from some conditional distribution is via rejec- 
tion sampling. Figure [T]d shows how this works in the 
Church language. Assume that the procedure joint 
draws samples from some joint distribution and that 
condition? is a predicate which checks whether some 
condition holds for each sample. The recursive proce- 
dure rejection draws samples from joint conditional 
on condition?. Crucially, this procedure makes use of 
no special conditioning operator but samples directly 
from the conditional distribution of interest. 

Of course, drawing conditional samples using 
rejection is very inefficient: In general, we may have 
to tolerate an exponential number of rejected samples 
before the condition is satisfied. However, if we could 
efficiently marginalize the rejection procedure, elim- 
inating all zero-probability paths, then we would have 
solved our target inference problem. 

For many probabilistic programs, efficient marginal- 
ization of this sort is possible. We are interested in 
programs for which many different executions share 
substructure. Problems with this character are clas- 
sically amenable to dynamic programming. Recursive 
programs, like rejection, which involve multiple ex- 
ecutions of the same procedure application, provide 
a particularly rich opportunity to exploit shared sub- 
structure. We will focus on these cases in the examples 
below. 

3 FACTORED SUM-PRODUCT 
NETWORKS 

In the process of computing the marginal distribu- 
tion for a given probabilistic program, we use factored 
sum-product networks as an intermediate representa- 
tion between original program and marginal distribu- 
tion. Like sum-product networks (Poon and Domin- 



gos 



2011), this representation factors out all "deter- 



ministic" computation, leaving only probability calcu- 
lations, i.e., sums and products. In addition, it makes 
explicit the dependencies between the distributions re- 
sulting from subcomputations. 

Definition 1. A factored sum-product network 
(FSPN) over variables x±,...,Xd is a directed graph 
with a uniquely labeled root node r. The internal 
nodes are sums and products. The leaves are indicators 
x\,..., Xd and Xi, ... , Xd, and reference nodes (y, x), 
where y is another node and x a vector of indicator 
values. Each edge {i,j) from a sum node i has a non- 
negative weight Wij . 

Let Ch(y) denote the children of node y. The value 
V(y,x) of a node y is defined as J2 z ech(y) w yz^{z,x) 
if y is a sum, as Y\ ze c^ y )^{z,x) if y is a prod- 
uct, as txj=xj if y is an indicator Xj, and as V(z, w) 



(game true) 



(game false) 



Ifal 




P((game false) = true) Ifalse P((game f alse) = false) l t 




P((game true)=true) Ifalse ^((game true)=false) 



Figure 2: The factored sum-product network corresponding to the game program (Figure [l^,) , showing sums, 
products, indicators, and reference nodes. P(r = v) references the probability of value v under node r. 



if y is a reference (z,w). This defines a system of 
equations. We denote the factored sum-product net- 
work F as a function of the indicator variables x = 
(xi,...,x d ,xi,...,x d ) by F(x) = V(r,x). For any 
given x, the value of the FSPN is the solution to the 
system of equations F(x) (if a unique solution exists). 

4 ALGORITHM 

4.1 Overview 

Our algorithm solves the following problem: Given a 
functional interpreter for a probabilistic programming 
language, how can we build an efficient marginalizer? 
In other words: How can we turn a universal sampler 
into a generic dynamic programming algorithm? 

If the interpreter were deterministic, we could just 
memoize it. If it were an interpreter for a stochas- 
tic language that does not allow self-recursion, i.e., 
where an interpreter call can never result in an in- 
terpreter call with the same arguments, we could use 
the interpreter to recursively compute and cache the 
distribution for each unique interpreter call. For lan- 
guages that allow self-recursion, direct caching is no 
longer feasible, since it could lead to infinite regress. 

On a high level, our approach is this: our algorithm 
takes as input the interpreter and a program, and 
builds a factored sum-product network, which can then 
be solved using methods such as fixed-point iteration. 
We intercept any recursive calls the interpreter makes 
to itself and any calls it makes to its source of random- 
ness, and build network structure that reflects these 
calls. The FSPN constructed in this way describes the 
marginal distribution of the interpreted program. 

4.2 Interpreters as Factored Coroutines 

As a prerequisite for the description of our algorithm, 
we now present mathematical objects that formalize 
the idea of an interpreter as a coroutine. 



Let V be a countable domain of values that a proba- 
bilistic program can return, and let V(V) be the do- 
main of probability measures on V. 

Let C, TZ, S, and X denote the (as yet undefined) do- 
mains of continuations, random choices, subcalls, and 
partial results. Loosely speaking, (1) continuations are 
functions from values to partial results, (2) random 
choices are pairs of continuations and distributions on 
values, (3) subcalls are pairs of continuations and par- 
tial results, and (4) partial results are either values, 
random choices, or subcalls. 

Formally, let C,1Z,S, and X be the smallest domains 
satisfying the recursive domain equations: 



C = V -> X 
K = C xP(V) 
S =C x X 
X = VUKUS 



(1) 
(2) 
(3) 
(4) 



An interpreter is a function from partial results to par- 
tial results. 

4.3 Compiling Programs to FSPNs 

BuildFSPN, shown in Algorithm 1, takes as arguments 
an interpreter in factored coroutine form and an ini- 
tial interpreter argument CEinit- The algorithm steps 
through all possible execution paths while building 
the corresponding factored sum-product network, but 
avoiding duplicate evaluation of subproblems. We first 
describe three ingredients for this procedure — the task 
queue, constant-time subcall identification, and factor- 
ization grain — then the algorithm BuildFSPN. 

Task queue. In programs with self-recursive calls, 
the exploration order of different execution paths can 
be highly constrained. For example, in order to eval- 
uate the first if -branch of (game true) in Figure [T^,, 
we need to know at least one of the return values of 
(game (not true) ) , but these in turn depend on the 
return values of (game true). In order to let pro- 



gram exploration be guided by what return values are 
known, we maintain a map terminals, which maps 
each root node to all known terminal values reachable 
from it, and a map callbacks, which maps each root 
node to a list of callbacks. A callback is a pair of a 
node n and a continuation c. When a new terminal 
v is found below a root node associated with callback 
(n, c), the call c(v) is used to continue evaluation and 
network building in the original context. 

Constant-time subcall identification. At each 
subcall, we need to determine whether the subcall is 
new or whether it has already been assigned a FSPN 
node. For the algorithm to have constant-time over- 
head over steps of the underlying interpreter, it is cru- 
cial that this computation takes place in constant time, 
i.e., it must not depend on the size of the interpreter 
arguments. This suggests the use of an underlying in- 
terpreter that represents values in a compressed way, 
e.g., using the value- number technique described in 
Aho et al. (2007). In Algorithm 1, subproblem maps 



interpreter arguments to network nodes. 

Factorization grain. There are two ways to deter- 
mine how much information sharing takes place: (1) 
While the interpreter may cede control at all recursive 
calls, it does not need to for our algorithm to be valid. 
There is a continuum between building a fully factored 
FSPN and building a tree of random choices without 
factorization. In our experiments, we have found it 
advantageous to factor at all calls that correspond to 
function applications. (2) What information the un- 
derlying interpreter passes to its recursive calls affects 
sharing. In Church, where interpreter arguments con- 
sist of expressions and environments, restricting envi- 
ronments to relevant environments is critical for effi- 
cient dynamic programming. 

Algorithm. Our algorithm maintains a queue of 
tasks, initialized to a single task for the first inter- 
preter call. Each task is a tuple of a thunk / (a func- 
tion without arguments), a previous node n prov , and 
an edge weight w prov (a probability). While the queue 
is not empty, the algorithm takes the first task in the 
queue and evaluates the function call /(). There are 
three types of return values (partial results): subcalls, 
random choices, and terminal values. We process the 
value according to its type: 

Subcalls are pairs of a continuation c and an interpreter 
argument s. At subcalls, we build a sum node n cur that 
will have one child for each return value of the subcall. 

If this is a new subcall, we add a new root node to the 
network and add to the queue the task of exploring 
this subcall, starting from the root node. 

If this is a known subcall, we look up what root node 



Algorithm 1: Compiling probabilistic programs 
to factored sum-product networks 

procedure BuildFSPN(I, x- m it) 
G = Graph() 
r = G.addNode(root) 
Q = [(A.X(a;init),r,1.0)] 

terminals, callbacks, subproblem = {}, {}, {} 
while Q is not empty do 

(/, Tlprev^prcv) = Q.pOpQ 

? = /0 

if a; is a value v then 

n C ui = G .a,ddNode(indicator, v) 

r = G.root[n pr ev] 

if v ^ terminals [r] then 

for all (n',c) in callbacks[r] do 

processTerminal(G, Q, r, v, n', c) 
tcrmmals[r] .add(u) 
else if £ is a random choice (c, v,p) then 
n C ui = G.addNode(sum) 
for all v,p £ v,p do 

Q.enqueue(A.c(u), n CUI , p) 
else if a; is a subcall (c, s) then 
Wcur = G.addNode(sum) 
if s <£ subproblem. keys() then 
r = G. addNode( root) 
subproblem [s] = r 
Q.enqueue(A.Z(s), r, 1.0) 
else 

r = subproblem [s] 

for all v £ terminals [r] do 

processTerminal(G, Q, r, v, n cur , c) 
callbacks [r] . add ( (n cur , c) ) 

G.addEdge(n prcv , n cur , w prcv ) 
return G 
end procedure 

procedure ProcessTerminal(G, Q, n root , v, n prov , c) 

^ P rod = G.addNode (product) 

n rc f = G.addNode(re/, n root , v) 

G.addEdge(n prev , n pro d, 1.0) 

G.addEdge(n prod , n ret , 1.0) 

Q.enqueue(A.c(«), n pro d, 1.0) 
end procedure 



it corresponds to and process all return values known 
for this subcall. For each such value, we add a product 
and reference node, and add to the queue the task of 
continuing evaluation using the continuation c. 

Finally, we store in callbacks for the root node r the 
continuation c together with n cur such that, when new 
return values for the subcall are found, we can continue 
building the network in the current context. 

Random choices are tuples of a continuation c, values 
v, and probabilities p. We add a sum node n cur and 
enqueue a call to the continuation for each value in v. 

Terminal values cause indicator nodes to be built. If 
a value is new for the current subproblem r, we no- 
tify all contexts waiting for return values under r by 



calling ProcessTerminal once for each callback asso- 
ciated with r. 

4.4 Solving FSPNs 

A FSPN corresponds to a system of equations (Section 
[3]). For probabilistic programs, this system tends to be 
sparse, reflecting the fact that, in general, most inter- 
preter calls that occur in the process of enumerating 
a given program do not depend on most other calls. 
We therefore cluster the equations into strongly con- 
nected components and solve the clusters of equations 
in topological order. Computing a topological order 
of strongly connected components is linear in the size 



of the graph (Tarjan 1972). By solving in topological 



order we know that all probabilities required to com- 
pute the solution of a component have been computed 
once we reach this component. 

In our examples, we use a simple substitution-based 
equation simplificr, fixed-point iteration, and New- 
ton's method to solve these equations. Exploring the 
use of other solution methods is a potential venue for 
future performance improvements. 

5 EMPIRICAL EVALUATION 

In this section, we describe three situations where we 
have found generic dynamic programming to be use- 
ful: teaching probabilistic models, research in compu- 
tational cognitive science, and analysis of multi-agent 
reasoning in game-theoretic situations. We present an 
example for each of these situations and compare dy- 
namic programming to other inference algorithms. 

In teaching probabilistic models, we usually aim 
to present modeling and inference patterns before we 
discuss the internals of inference algorithms, since the 
former provide motivation for the latter. The Proba- 
bilistic Models of Cognition tutorial by |Goodman et aL] 
(2011) follows this approach and has been used in grad- 
uate classes at MIT and Stanford. Since implemen- 
tations of probabilistic programming languages sup- 
ply universal inference algorithms, it is possible to 
nonetheless allow students to experiment with mod- 
els and solve exercises. 

However, the performance of existing "universal" algo- 
rithms strongly depends on the structure of the models 
they are applied to, even for models with a very small 
number of variables. Rejection sampling is only feasi- 
ble as long as we do not condition on low-probability 
events; MCMC requires that the distribution does not 
have modes that are isolated with respect to the pro- 
posal structure of the algorithm. 

Even in cases where sampling is feasible, it poses a 



( query 

; ; Genera 
(define t 
(define t 
(define s 

(repeat 
(define ( 

( list-r 
( def ine 

(flip 
( def ine 

( sum 
(map 



tive model 
eaml (list 1)) 
eam2 (list 2 3)) 
tr engths 

4 (A () (if (flip) 10 5)))) 
strength person) 
ef strengths person)) 
lazy person) 
/ 1 3))) 

total-pulling team) 

A (person) 
(if (lazy person) 

(/ (strength person) 2) 
(strength person))) 
team) ) ) 

(define (winner teaml team2) 
(if (< (total-pulling teaml) 
(total-pulling team2)) 
' team 2 
1 teaml ) ) 

; ; Query expression 

(list (strength 0) (strength 1)) 

;; Condition 
(and 

(eq? 'teaml (winner teaml team2)) 
(eq? 'team2 (winner teaml team2)))) 

Figure 3: The rope-pulling game, a simple generative 
model used in teaching probabilistic modeling. 



challenge to students: it can be difficult to distin- 
guish approximation noise from systematic inference 
patterns. For Metropolis-Hastings in the space of pro- 



gram traces (Goodman et al. 2008 Wingate et al. 



2011), quantitative analysis of mixing times does not 



exist, hence analysis of convergence can be difficult 
even for experts; students' lack of background knowl- 
edge exacerbates this effect. 

For example, consider the rope-pulling game (Figure 
[3]), a simple probabilistic program without nested con- 
ditioning. Figure [4] shows how the LI error between 
the estimated and true posterior distribution devel- 
ops over time for rejection, MCMC, and dynamic pro- 
gramming. While MCMC has difficulty mixing be- 
tween modes, and while rejection computes estimates 
using very few samples due to a low-probability condi- 
tion, dynamic programming deterministically returns 
the exact answer after about 6 seconds. 

In cognitive science research, we wish to quickly 
explore a wide range of model variations. While the 
model prototypes used in research have a tiny num- 
ber of variables compared to the state of the art in 
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Figure 4: Convergence to true distribution for the rope-pulling model. Each point represents the LI 
error between the estimated and true distribution for a given runtime and algorithm. While all algorithms 
eventually converge to the correct distribution for this model used in teaching, the only algorithm that quickly 
provides a precise answer is dynamic programming. In this example, MCMC has difficulty mixing between 
modes. Rejection computes estimates using very few samples due to a low-probability condition. 



machine learning, they are structurally complex and 
use features such as mutual recursion, nested condi- 
tioning, and stochastic higher-order functions. Proba- 
bilistic programming makes it possible to explore this 
space without building custom inference algorithms. 

The caveat that the performance of current sampling- 
based algorithms strongly depends on models, even in 
small state spaces, applies here as well. For example, 



Goodman and Stuhlmiiller (2012 ) proposed a model of 



language understanding based on the idea that listen- 
ers assume that speakers choose their utterances ap- 
proximately optimally, and that listeners interpret an 
utterance by using Bayesian inference to "invert" this 
model of the speaker. Figure [5] shows part of a model 
of this type that predicts an interaction between the 
speaker's state of knowledge and the listener's inter- 
pretation of scalar implicatures (e.g., "some" implies 
"not all"). Using dynamic programming, the time it 
takes to compute the marginal distribution for this 
model grows linearly in the depth of recursive reason- 
ing, whereas for current sampling techniques, inference 
time grows exponentially. 

Moreover, some of the model features that are of 
research interest do not easily fit into the sampling 
framework. For example, in softmax-optimal decision- 
making, an action a is chosen according to exponenti- 
ated expected utility under a belief distribution P(s), 
i.e., P(a) <x exp (aEp( s )[J7(a; s)]). A direct transla- 
tion into a probabilistic language with sampling se- 
mantics seems to require additional programming con- 
structs that reify distributions. Such constructs can be 



provided more easily in the setting of exact inference. 

The analysis of multi-agent reasoning in game- 
theoretic situations shares many properties with 
cognitive science research, but places even more em- 
phasis on multiply nested conditioning. This com- 
monly rules out existing sampling-based algorithms. 
At the same time, enumeration is often not an option 
either, since exploiting shared structure is critical in 
reducing the state space to tractable size: in the anal- 
ysis of multiple agents thinking about one another, we 
can share computation between all agents, actual and 
counterfactual, that are modeled as being in the same 
state of mind. 

As a particularly difficult example, consider the "blue- 
eyed islanders" puzzle, a well-known problem in epis- 



tcmic logic ( Tao 2008 1 . The setup is as follows: There 



is a tribe on a remote island. Out of the n people in 
this tribe, m have blue eyes. Their religion forbids 
them to know their own eye color, or even to discuss 
the topic. Therefore, everyone sees the eye color of 
every other islander, but does not know their own eye 
color. If an islander discovers their color, they have to 
publicly announce this at noon and leave the island. 
All islanders are highly logical. One day, a foreigner 
comes to the island and — speaking to the entire tribe — 
he says: "At least one of you has blue eyes." What 
happens next? Results for a stochastic version of this 
puzzle are shown in Figure [6j 

The difficulty of this model stems from the fact that 
each day, every islander reasons about the reasoning of 
all of the other islanders on the previous day, and that 



(define (speaker access state depth) 
( query 

(define sentence (sentence-prior)) 
sentence 

(equal? (belief state access) 

(listener access sentence depth)))) 

(define (listener sp-access sentence depth) 
( query 

(define state (state-prior)) 
state 

(if (= depth) 

(sentence state) 
(equal? sentence 

(speaker sp-access state 
(- depth 1)))))) 
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Figure 5: Increase in dynamic programming inference time as a function of nested conditioning 
depth. For this model used in cognitive science research, dynamic programming makes it possible to explore 
nested recursive conditioning with linear growth of inference time in the depth of recursion. Each point on the 
plot corresponds to a run of our algorithm on the model with a given depth. For rejection and MCMC over 
rejection, expected inference time grows exponentially. 



their reasoning must again include all islanders' rea- 
soning on the day before the previous day, etc. How- 
ever, due to the symmetry of the setup, all islanders 
with blue eyes and all islanders without blue eyes do 
the same computation on any given day. Their com- 
putations are merged by our algorithm, which makes 
exact inference feasible for small populations. 

6 RELATED WORK 

Our algorithm is related to and inspired by a long tra- 
dition of algorithms which use dynamic programming 
to exploit reusable structure in the natural language 
processing, logic programming, and functional pro- 
gramming literatures. For example, it is known that, 
in general, exactly solving problems such as marginal- 
ization for arbitrary recursive programs leads to sys- 
tems of nonlinear equations (see, e.g., comments in 
Eisner et al.l 120051). iKlein and Manningl (120011) exploit 



the functional programming language IBAL, a prob- 
abilistic variant of ML rtPfefferl 120011). IBAL pro- 



strongly connected components of the computation 
graph for PCFGs to perform efficient exact marginal- 
ization in a way similar to the present algorithm. It is 
beyond the scope of this paper to review the many 
connections with individual algorithms presented in 
the literature. Instead, we focus on three systems 
which attempt use dynamic programming to provide 
general inference algorithms for universal, probabilis- 
tic (or, more generally, weighted) programming lan- 
guages: IBAL, PRISM, and Dyna. 

The most closely related system to the present work is 



vides an exact marginalization algorithm for discrete 
probabilistic models, which is based on a generaliza- 
tion of variable elimination applied to computation 
graphs. The graph used by this algorithm also ex- 
ploits sharable subcomputations across the evaluation 
of the probabilistic program. However, the present 
algorithm is more general than the IBAL algorithm 
in an important way. The IBAL algorithm relies on 
acyclic computation graphs; this is equivalent to the 
requirement that the computation be evidence-finite 
(Koller et al. 19971 — there must only be a finite num- 



ber of computations which can give rise to the ob- 
served evidence. By contrast, our algorithm handles 
many cases of evidence-infinite computation. For ex- 
ample, the simple recursive program shown in Figure 
[l^,, which has finite support {true, false} but an in- 
finite number of computations which give rise to each 
support value, cannot be marginalized by IBAL, but 
is correctly handled by our algorithm. Practical exam- 
ples of such evidence-infinite computations include the 
nested-query models for multi-agent reasoning that we 
have described above. 

Another system which is similar to the present work is 
PRISM, a probabilistic generalization of Prolog, which 
also makes use of dynamic programming to provide 
a general inference algorithm. Although PRISM is 
able to recover many standard algorithms for problems 
such as PCFG estimation (e.g., the inside-outside al- 
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Figure 6: While the blue-eyed islanders puzzle is challenging for all generic inference algorithms, dynamic 
programming allows predictions for small population sizes that are already intractable for MCMC and rejection. 
The figure shows results for population size 4, all islanders blue-eyed: on the 4th day, it is highly likely that all 
islanders decide to leave. 



gorithm), like IBAL, it cannot handle evidence-infinite 
computations (Sato 20091). 



A somewhat different approach is Dyna (Eisner 



et al. 2005 ) , a programming language for expressing 



weighted deductive logic programs. Dyna makes use of 



generalizations of parsing-as-deduction (phieber et al. 



1995) and semi-ring parsing (Goodman 1999) to com 



pile weighted logic programs into highly optimized dy- 
namic programs. Dyna differs from our algorithm in 
the target level of abstraction. Our algorithm is fo- 
cused on the problem of rapid prototyping of mod- 
els for which no standard dynamic programming al- 
gorithm exists. The programmer simply provides an 
interpreter, and our algorithm automatically exploits 
whatever sharing is exposed by the structure of the 
recursive calls made in the process of computing the 
marginal distribution for a particular model. By con- 
trast, Dyna is a language for abstractly expressing spe- 
cific dynamic programming algorithms and compiling 
these algorithms to highly efficient code. It allows the 
programmer lower-level control over algorithm specifi- 
cation, but it also requires the programmer to specify 
these algorithmic details. 

7 CONCLUSION 

We have developed a dynamic programming algorithm 
for exact inference in probabilistic programs. We have 
illustrated how this algorithm aids the use of prob- 
abilistic programs in teaching and research. Future 
work includes incorporating techniques from other ap- 
proaches to dynamic programming (such as evidence 
propagation from IBAL, and efficient code generation 
from Dyna) and exploring techniques for approximate 
dynamic programming. 
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